Suppose you are given a graph consisting of $N$ pairs of $\{x, y\}$ values, and that the value sof the ordinate $\{y_i\}$ are subject to a constant, but unknown, amount of noise $\sigma$. The task is to fit a set of $M$ parameters $\{a_j\}$ so that the $\{y_i\}$ can be adequately represented in terms of a set of basis functions $\{f_j(x)\}$:

$$ y(x_i) \approx \hat y(x_i) = \sum_{j=1}^M a_j f_j(x_i) $$

or: $\bf{\hat y} = {\bf f}\cdot{\bf a}$, where ${\bf f}$ is an $(N\times M)$ matrix. The functions ${f}$ might, for example, be a set of polynomials, or a Fourier series.

The above statemet by Stephen Gull is from his paper Bayesian inductive inference and maximum entropy which we shall follow in this notebook to discuss the Bayesian solution to the problem of univariate regression.

We start, following Gull, with the task of determining the parameters $\{a_j\}$ when $M$ and $\sigma$ are known in advance. Then, ${\bf a} = (a_1, a_2, \ldots, a_M)$ are the uknown parameters to be determined and Bayes theorem gives the probability distribution of the parameters, given the data $D$, to be

$$ P({\bf a}) | D) = \frac{P(D | {\bf a}) P({\bf a})}{\int P({\bf a}) | D)d^M a} $$

The maximum entropy distribution for the noise is a Gaussian distribution with zero mean and variance $\sigma^2$.



In [ ]:

    
import numpy as np
from matplotlib import pyplot as plt
plt.plot(np.random.randn(10))
plt.show()

Bayesian regression